Skip to content

canon: add telemetry-validation-gate constraint#210

Merged
klappy merged 2 commits into
mainfrom
feat/canon-telemetry-validation-gate
May 16, 2026
Merged

canon: add telemetry-validation-gate constraint#210
klappy merged 2 commits into
mainfrom
feat/canon-telemetry-validation-gate

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented May 15, 2026

Adds tier-1 canon defining the single gate for verifying the telemetry Emission Contract.

Why this exists

The handoff at klappy://odd/handoffs/2026-05-14-telemetry-coverage-completeness ran a "24-hour soak validator" framing on the post-PR-#157 cutover. In session it became clear that framing is incoherent for oddkit specifically: there is no organic load to soak against, and "wait for organic ≥95% coverage on every tool" is unmeetable against manufactured smoke traffic.

The actual question — does the wrapper emit the numbers we expect for the payloads we send? — is deterministic and answerable in a single smoke pass per surface.

What the gate is

  1. Enumerate every server.tool() registration.
  2. Drive one synthetic call per tool through each active surface (main preview, prod).
  3. Locally compute expected bytes_in, bytes_out, tokens_in, tokens_out (cl100k_base).
  4. Query telemetry; match emitted to expected; pass if within tokenizer noise (~5%).

No time bound. No sample threshold beyond 1/tool/surface. No statistical ceremony.

Relationship to release-validation-gate

Rule 2 there triggers on response-envelope changes, tool add/remove, governance-read changes, and orchestrate.ts edits. Wrapper-only changes touch none of these — callers see identical responses. This PR notes the orchestrator may smoke-verify directly per this new gate when Rule 2 is not triggered. If a future wrapper change does touch load-bearing surface in the Rule 2 sense, both gates apply.

Gauntlet (Writing Canon checklist)

  • Title test: pass
  • Blockquote test: pass — full compressed argument
  • Metadata test: pass — full file paths in derives_from
  • Summary test: pass — self-contained
  • Header scan test: pass — sequence tells the story
  • No buried claims: pass
  • Axiom space test: pass — ~2K words, similar order to release-validation-gate
  • Ghost writer test: pass — caught one negation-parallelism instance pre-commit and rewrote
  • Em-dash density: non-clustering (counts checked per section)

Receipts

  • klappy://canon/observations/2026-05-14-telemetry-coverage-gap-quantified — diagnostic
  • klappy://canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern — decision record
  • klappy://canon/observations/performed-prudence-anti-pattern — the failure mode this gate avoids
  • klappy://odd/handoffs/2026-05-14-telemetry-coverage-completeness — superseded soak framing

Note

Low Risk
Low risk: adds a new tier-1 canon constraint document only, with no code or runtime behavior changes.

Overview
Adds a new tier-1 canon constraint, telemetry-validation-gate, defining the required release gate for validating the telemetry Emission Contract.

The gate replaces time-bound/organic-traffic “soak” expectations with a single synthetic smoke call per registered tool per deployment surface, and requires comparing emitted bytes_in/bytes_out/tokens_in/tokens_out against locally computed ground truth (with an explicit noise tolerance and SSE streaming exception). It also clarifies how this gate interacts with release-validation-gate Rule 2 (wrapper-only changes can be verified via smoke without triggering fresh-context validation unless load-bearing surface changes).

Reviewed by Cursor Bugbot for commit e46ca20. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds tier-1 canon defining the single smoke-and-verify gate for the
telemetry Emission Contract: enumerate registered tools, drive one
synthetic call per tool per surface, compare emitted bytes/tokens against
locally-computed expectations. No time bound. No statistical threshold.
Sample size of 1 per tool per surface is sufficient because the wrapper
is deterministic.

Supersedes the implicit '24-hour soak' framing in
odd/handoffs/2026-05-14-telemetry-coverage-completeness, which assumed
organic load oddkit does not actually receive.

Notes that release-validation-gate Rule 2 is arguably not triggered by
wrapper-only changes (no response-envelope change, no tool
add/remove). If a future wrapper change touches load-bearing surface in
the Rule 2 sense, both gates apply.

derives_from telemetry-governance, release-validation-gate,
performed-prudence-anti-pattern.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Canon Quality — Frontmatter Schema ✅

All 41 file(s) in writings/ conform to klappy://canon/meta/frontmatter-schema.

Validator: scripts/validate-frontmatter.py · Canon: klappy://canon/constraints/frontmatter-validation-before-merge · Run: #153

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Canon Quality — oddkit_audit

No dead klappy:// references or legacy link patterns found in writings/. 42 files scanned.

Spec: klappy://docs/oddkit/specs/oddkit-audit · Workflow: .github/workflows/canon-quality.yml · Run: #153

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Gate procedure computes expected values from wrong inputs
    • Updated steps 2 and 3 of the gate procedure to record and measure the in-memory args object and { content: [...] } envelope (matching the wrapper's actual emission inputs per telemetry-governance Rule 2) instead of the full HTTP request/response bodies.
Preview (e46ca204fd)
diff --git a/canon/constraints/telemetry-validation-gate.md b/canon/constraints/telemetry-validation-gate.md
new file mode 100644
--- /dev/null
+++ b/canon/constraints/telemetry-validation-gate.md
@@ -1,0 +1,114 @@
+---
+uri: klappy://canon/constraints/telemetry-validation-gate
+title: "Telemetry Validation Gate — Smoke Every Tool, Verify Every Number"
+audience: canon
+exposure: nav
+tier: 1
+voice: neutral
+stability: evolving
+tags: ["canon", "constraint", "telemetry", "validation", "smoke-test", "wrapper-correctness", "release-pipeline", "analytics-engine"]
+epoch: E0008
+date: 2026-05-15
+derives_from: "canon/constraints/telemetry-governance.md, canon/constraints/release-validation-gate.md, canon/observations/performed-prudence-anti-pattern.md"
+complements: "canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern.md, canon/observations/2026-05-14-telemetry-coverage-gap-quantified.md"
+governs: "Every release that touches the telemetry Emission Contract surface in oddkit and TruthKit"
+status: active
+---
+
+# Telemetry Validation Gate — Smoke Every Tool, Verify Every Number
+
+> The Emission Contract requires every registered tool to emit accurate metered usage on every call. Verifying it is one smoke pass per surface: hit every tool, compare the emitted `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` against the request and response that were actually sent. If the numbers match expectations within tokenizer noise (3–4% for `cl100k_base`), the wrapper is working. There is no soak period, no organic-load threshold, no statistical sample bar. Synthetic traffic is the only traffic; the wrapper is deterministic; one call per tool is sufficient.
+
+---
+
+## Summary — Stop Pretending Sample Size Buys Confidence
+
+oddkit's hosted service does not see enterprise-scale organic traffic. Real consumers number in the low single digits at any given moment, and most of those are the maintainer themselves. A validation model built around "wait for 24 hours of organic load and check per-tool coverage at 95%" is performed prudence — it inflates statistical ceremony around a question that does not need statistics to answer.
+
+The actual question is: does the per-tool wrapper emit the correct metered values when a known payload passes through it? That question is deterministic. The wrapper is code. Either it reads the JSON-stringified args and envelope, runs `cl100k_base` over them, and writes the result to Analytics Engine — or it doesn't. One call with a known input and known output answers the question completely.
+
+The gate is therefore: drive a synthetic smoke pass across every registered tool on every active deployment surface (main preview and prod after promotion). For each call, compare the emitted numeric fields against what the smoke driver actually sent and received. Tokenizer noise of 3–4% for English-prose payloads is the only legitimate variance; anything else is a bug.
+
+Sample size is one per tool per surface. Increase it for operator margin if desired, but the canon bar is one. There is no time bound. There is no organic-load requirement. If the smoke pass shows accurate numbers across every tool, the wrapper is verified.
+
+---
+
+## The Gate
+
+**When:** After any PR touching `withTelemetry`, tool registration, or the emission envelope is deployed to a surface — main preview after merge to `main`, or prod after the `main → prod` promotion. Run the gate against each surface the change reaches, before declaring that surface verified.
+
+**Question it answers:** Does the wrapper emit accurate `bytes_in`, `bytes_out`, `tokens_in`, `tokens_out` for every registered tool?
+
+**Procedure:**
+
+1. Enumerate every `server.tool()` registration in `workers/src/index.ts`. This is the smoke target list.
+2. Drive one synthetic call per tool through the surface's `/mcp` endpoint. Record the exact `args` object sent (the JSON-RPC `params.arguments` payload) and the exact `{ content: [...] }` envelope returned by the handler — not the full HTTP request/response bodies, which include JSON-RPC framing the wrapper does not see.
+3. For each call, compute the expected values locally against the same in-memory values the wrapper measures per `klappy://canon/constraints/telemetry-governance` Rule 2: `bytes_in = utf8_byte_length(JSON.stringify(args))`, `bytes_out = utf8_byte_length(JSON.stringify(content_envelope))`, `tokens_in = cl100k_count(JSON.stringify(args))`, `tokens_out = cl100k_count(JSON.stringify(content_envelope))`. For SSE-streamed responses, expected `bytes_out = 0` and `tokens_out = 0` per the Emission Contract.
+4. Query `oddkit_telemetry` with `event_type = 'tool_call'`, `worker_version = <surface-version>`, and a timestamp window covering the smoke run.
+5. Match each emitted row to the corresponding smoke call (by tool name and timing). Compare emitted versus expected on all four fields.
+
+**Pass:** Every registered tool appears in the telemetry dataset, and every emitted numeric field is within tokenizer noise (±5%) of the expected value computed locally.
+
+**Fail (missing tool):** Any registered tool is absent from the dataset after smoke. The wrapper is not attached to that registration. Block downstream work on this surface; fix forward.
+
+**Fail (wrong number):** Any emitted field is off by more than the noise floor. The wrapper is attached but emission is inaccurate. Investigate; fix; re-smoke.
+
+**Sample threshold:** One call per tool per surface is sufficient. The wrapper is deterministic; a second call with the same input emits the same output. Higher sample counts are operator discretion for cutover margin, not canon requirement.
+
+---
+
+## Why No Time Bound
+
+oddkit's hosted service receives sparse, mostly maintainer-driven traffic. "Wait 24 hours and check organic coverage" is a pattern borrowed from systems where organic traffic actually fills the sample space. Here it does not. A 24-hour window after promotion produces a dataset dominated by maintainer test calls and a handful of synthetic probes — the same data the smoke pass produces immediately, just delayed.
+
+Time bounds are appropriate for systems where the question is whether the wrapper behaves correctly under unforeseen load patterns the operator cannot manufacture — a real concern for services running thousands of QPS across heterogeneous clients. oddkit answers a smaller question: do the numbers come out right for the payloads we send? That is fully answered by deliberate exercise.
+
+Removing the time bound also removes a class of failure mode: orchestrators waiting passively for a soak window to mature, mistaking elapsed time for validation work. The smoke pass is active verification with a definite endpoint.
+
+---
+
+## Why Synthetic Is Enough
+
+The Emission Contract specifies in-memory measurement after Zod validation and before MCP transport framing. The wrapper does not care whether the call originated from a manufactured smoke probe or a real consumer; it sees the same `args` object and the same `{ content: [...] }` envelope. Synthetic and organic traffic produce identical telemetry rows when the payload sizes match.
+
+Synthetic traffic has an additional advantage that organic does not: the smoke driver knows the exact request and response bytes locally. Organic traffic only produces emitted values in the dataset; the ground truth is not directly observable. Verification against organic load is necessarily a sanity check against expected ranges, not against known values. The smoke pass is the stricter test.
+
+---
+
+## Cross-Surface Coverage
+
+The wrapper deploys to whichever surface receives the code. Currently that is two surfaces:
+
+- **Main preview** at `https://main-oddkit.klappy.workers.dev/mcp` — auto-deployed by Cloudflare on every merge to `main` in `klappy/oddkit`.
+- **Production** at `https://oddkit.klappy.dev/mcp` — deployed when the `main → prod` promotion PR merges.
+
+Each surface must be smoke-verified independently. Verifying main preview does not verify prod; the surfaces run independent worker versions and could in principle diverge.
+
+When the program adds TruthKit or any other oddkit-pattern MCP server, the same gate applies to each of those surfaces.
+
+---
+
+## Relationship to release-validation-gate Rule 2
+
+`klappy://canon/constraints/release-validation-gate` Rule 2 requires fresh-context validator dispatch on promotion PRs that touch load-bearing surface. "Load-bearing surface" is defined there by response-envelope changes, new or removed tool registrations, governance file reads, matcher algorithm changes, and `workers/src/orchestrate.ts` modifications. The telemetry wrapper does not change any of these — callers observe identical responses; no tools are added or removed; no governance reads change.
+
+A wrapper change is therefore arguably outside Rule 2's trigger. The orchestrator may smoke-verify directly per this gate without dispatching a fresh-context validator, provided the smoke pass shows accurate numbers across every tool on every surface.
+
+If a future wrapper change *does* touch load-bearing surface (for example, exposing new envelope fields to callers), Rule 2 fires in addition to this gate, and both must be satisfied.
+
+---
+
+## Receipts
+
+- `klappy://canon/observations/2026-05-14-telemetry-coverage-gap-quantified` — the diagnostic that motivated the Emission Contract and exposed how prior time-bound validation hid the actual coverage problem.
+- `klappy://canon/decisions/DR-20260514-0001-telemetry-wrapper-pattern` — decision record for the wrapper architecture this gate verifies.
+- `klappy://canon/observations/performed-prudence-anti-pattern` — the failure mode this gate is structured to avoid (statistical ceremony around a deterministic question).
+- `klappy://odd/handoffs/2026-05-14-telemetry-coverage-completeness` — original handoff whose "24-hour soak" framing this canon supersedes.
+
+---
+
+## See Also
+
+- `klappy://canon/constraints/telemetry-governance` — the Emission Contract this gate verifies.
+- `klappy://canon/constraints/release-validation-gate` — separate constraint covering promotion-PR fresh-context review.
+- `klappy://canon/constraints/measure-before-you-object` — the methodology that argues against theoretical objections to empirical answers; applies here against statistical-threshold arguments to deterministic questions.

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 3f52f07. Configure here.

Comment thread canon/constraints/telemetry-validation-gate.md Outdated
@klappy klappy merged commit d89e98e into main May 16, 2026
3 checks passed
@klappy klappy deleted the feat/canon-telemetry-validation-gate branch May 16, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants